481 research outputs found
The Generalized A* Architecture
We consider the problem of computing a lightest derivation of a global
structure using a set of weighted rules. A large variety of inference problems
in AI can be formulated in this framework. We generalize A* search and
heuristics derived from abstractions to a broad class of lightest derivation
problems. We also describe a new algorithm that searches for lightest
derivations using a hierarchy of abstractions. Our generalization of A* gives a
new algorithm for searching AND/OR graphs in a bottom-up fashion. We discuss
how the algorithms described here provide a general architecture for addressing
the pipeline problem --- the problem of passing information back and forth
between various stages of processing in a perceptual system. We consider
examples in computer vision and natural language processing. We apply the
hierarchical search algorithm to the problem of estimating the boundaries of
convex objects in grayscale images and compare it to other search methods. A
second set of experiments demonstrate the use of a new compositional model for
finding salient curves in images
Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data
The estimation of viewpoints and keypoints effectively enhance object
detection methods by extracting valuable traits of the object instances. While
the output of both processes differ, i.e., angles vs. list of characteristic
points, they indeed share the same focus on how the object is placed in the
scene, inducing that there is a certain level of correlation between them.
Therefore, we propose a convolutional neural network that jointly computes the
viewpoint and keypoints for different object categories. By training both tasks
together, each task improves the accuracy of the other. Since the labelling of
object keypoints is very time consuming for human annotators, we also introduce
a new synthetic dataset with automatically generated viewpoint and keypoints
annotations. Our proposed network can also be trained on datasets that contain
viewpoint and keypoints annotations or only one of them. The experiments show
that the proposed approach successfully exploits this implicit correlation
between the tasks and outperforms previous techniques that are trained
independently.Comment: 11 pages, 4 figure
Supervised Versus Unsupervised Deep Learning Based Methods for Skin Lesion Segmentation in Dermoscopy Images
Image segmentation is considered a crucial step in automatic dermoscopic image analysis as it affects the accuracy of subsequent steps. The huge progress in deep learning has recently revolutionized the image recognition and computer vision domains. In this paper, we compare a supervised deep learning based approach with an unsupervised deep learning based approach for the task of skin lesion segmentation in dermoscopy images. Results show that, by using the default parameter settings and network configurations proposed in the original approaches, although the unsupervised approach could detect fine structures of skin lesions in some occasions, the supervised approach shows much higher accuracy in terms of Dice coefficient and Jaccard index compared to the unsupervised approach, resulting in 77.7% vs. 40% and 67.2% vs. 30.4%, respectively. With a proposed modification to the unsupervised approach, the Dice and Jaccard values improved to 54.3% and 44%, respectively
Multiple Object Tracking in Urban Traffic Scenes with a Multiclass Object Detector
Multiple object tracking (MOT) in urban traffic aims to produce the
trajectories of the different road users that move across the field of view
with different directions and speeds and that can have varying appearances and
sizes. Occlusions and interactions among the different objects are expected and
common due to the nature of urban road traffic. In this work, a tracking
framework employing classification label information from a deep learning
detection approach is used for associating the different objects, in addition
to object position and appearances. We want to investigate the performance of a
modern multiclass object detector for the MOT task in traffic scenes. Results
show that the object labels improve tracking performance, but that the output
of object detectors are not always reliable.Comment: 13th International Symposium on Visual Computing (ISVC
CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection
Robust face detection in the wild is one of the ultimate components to
support various facial related problems, i.e. unconstrained face recognition,
facial periocular recognition, facial landmarking and pose estimation, facial
expression recognition, 3D facial model construction, etc. Although the face
detection problem has been intensely studied for decades with various
commercial applications, it still meets problems in some real-world scenarios
due to numerous challenges, e.g. heavy facial occlusions, extremely low
resolutions, strong illumination, exceptionally pose variations, image or video
compression artifacts, etc. In this paper, we present a face detection approach
named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN)
to robustly solve the problems mentioned above. Similar to the region-based
CNNs, our proposed network consists of the region proposal component and the
region-of-interest (RoI) detection component. However, far apart of that
network, there are two main contributions in our proposed network that play a
significant role to achieve the state-of-the-art performance in face detection.
Firstly, the multi-scale information is grouped both in region proposal and RoI
detection to deal with tiny face regions. Secondly, our proposed network allows
explicit body contextual reasoning in the network inspired from the intuition
of human vision system. The proposed approach is benchmarked on two recent
challenging face detection databases, i.e. the WIDER FACE Dataset which
contains high degree of variability, as well as the Face Detection Dataset and
Benchmark (FDDB). The experimental results show that our proposed approach
trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE
Dataset by a large margin, and consistently achieves competitive results on
FDDB against the recent state-of-the-art face detection methods
Plan-view Trajectory Estimation with Dense Stereo Background Models
In a known environment, objects may be tracked in multiple views using a set of back-ground models. Stereo-based models can be illumination-invariant, but often have undefined values which inevitably lead to foreground classification errors. We derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions. Foreground points are detected quickly in new images using pruned disparity search. We adopt a 'late-segmentation' strategy, using an integrated plan-view density representation. Foreground points are segmented into object regions only when a trajectory is finally estimated, using a dynamic programming-based method. Object entry and exit are optimally determined and are not restricted to special spatial zones
Object Detection Using Strongly-Supervised Deformable Part Models
International audienceDeformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization during training due to the optimization of non-convex cost function. This paper investigates limitations of such an initialization and extends earlier methods using additional supervision. We explore strong supervision in terms of annotated object parts and use it to (i) improve model initialization, (ii) optimize model structure, and (iii) handle partial occlusions. Our method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning setups where part-level annotation is provided for a fraction of positive examples only. Experimental results are reported for the detection of six animal classes in PASCAL VOC 2007 and 2010 datasets. We demonstrate significant improvements in detection performance compared to the LSVM [1] and the Poselet [3] object detectors
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
We present a method for simultaneously estimating 3D human pose and body
shape from a sparse set of wide-baseline camera views. We train a symmetric
convolutional autoencoder with a dual loss that enforces learning of a latent
representation that encodes skeletal joint positions, and at the same time
learns a deep representation of volumetric body shape. We harness the latter to
up-scale input volumetric data by a factor of , whilst recovering a
3D estimate of joint positions with equal or greater accuracy than the state of
the art. Inference runs in real-time (25 fps) and has the potential for passive
human behaviour monitoring where there is a requirement for high fidelity
estimation of human body shape and pose
Grid Loss: Detecting Occluded Faces
Detection of partially occluded objects is a challenging computer vision
problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of
the detection window are occluded, since not every sub-part of the window is
discriminative on its own. To address this issue, we propose a novel loss layer
for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a
convolution layer independently rather than over the whole feature map. This
results in parts being more discriminative on their own, enabling the detector
to recover if the detection window is partially occluded. By mapping our loss
layer back to a regular fully connected layer, no additional computational cost
is incurred at runtime compared to standard CNNs. We demonstrate our method for
face detection on several public face detection benchmarks and show that our
method outperforms regular CNNs, is suitable for realtime applications and
achieves state-of-the-art performance.Comment: accepted to ECCV 201
Tree-based Coarsening and Partitioning of Complex Networks
Many applications produce massive complex networks whose analysis would
benefit from parallel processing. Parallel algorithms, in turn, often require a
suitable network partition. For solving optimization tasks such as graph
partitioning on large networks, multilevel methods are preferred in practice.
Yet, complex networks pose challenges to established multilevel algorithms, in
particular to their coarsening phase.
One way to specify a (recursive) coarsening of a graph is to rate its edges
and then contract the edges as prioritized by the rating. In this paper we (i)
define weights for the edges of a network that express the edges' importance
for connectivity, (ii) compute a minimum weight spanning tree with
respect to these weights, and (iii) rate the network edges based on the
conductance values of 's fundamental cuts. To this end, we also (iv)
develop the first optimal linear-time algorithm to compute the conductance
values of \emph{all} fundamental cuts of a given spanning tree. We integrate
the new edge rating into a leading multilevel graph partitioner and equip the
latter with a new greedy postprocessing for optimizing the maximum
communication volume (MCV). Experiments on bipartitioning frequently used
benchmark networks show that the postprocessing already reduces MCV by 11.3%.
Our new edge rating further reduces MCV by 10.3% compared to the previously
best rating with the postprocessing in place for both ratings. In total, with a
modest increase in running time, our new approach reduces the MCV of complex
network partitions by 20.4%
- …